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ASYMPTOTICS  FOR  COMFIGURAL  ESTIMATORS 

Stephan  Morgen thaler 
M.X.T. 

Statistics  Center 

1.  Introduction 

Configural  polys sapling  denotes  a  method  of  estimation 
which  is  geared  to  small  sample  sixes  and  produces  "robust” 
methods  (see  Pregibon  and  Tukey  (1980) ) .  There  are  important 
differences  to  the  robustness  philosophy  as  developed  by 
Huber  (see  Huber  (1964) ) .  Since  in  small  samples  the  distri¬ 
butions  of  estimators  are  quite  intractable  one  has  to  rely 
on  numerical  methods  in  order  to  , evaluate  even  relatively 
simple  performance  summaries  like  the  mean-square-error. 
This  holds  true  except  in  some  simple  cases  —  like  the 
Gaussian  sampling  model  —  where  a  few  expectations  can  be 
evaluated  in  closed  form.  In  this  connection  it  is  important 
for  the  statistical  community  to  realize  that  numerical 
methods  are  perfectly  acceptable.  They  do,  however,  limit 
the  number  of  sampling  situations  we  can  take  into  con¬ 
sideration.  This  in  contrast  to  an  asymptotic  approach, 
where  for  simple  models  an  infinity  of  sampling  situations 
can  be  considered  simultaneously  (Huber  (1964) ) . 


Pitman  (1938)  for  axample  solves  the  small  sample  problem 
for  a  single  sampling  situation  in  a  location  and  scale  set¬ 
ting.  In  this  paper  we  will  show  what  happens  if  Pitman's 
method  is  extended  to  two  sampling  situations.  And  we  will 
address  the  question  of  the  asymptotic  performance  of  such 
estimators. 

An  asymptotic  analysis  is  the  simplest  way  to  learn 
something  about  the  behavior  of  an  estimator  in  a  variety  of 
sampling  situations.  But  it  only  gives  a  partial  answer  and 
we  should  not  forget  the  more  important  approach  based  on 
performing  "experiments”  for  small  sample  sixes.  This  paper , 
however,  will  restrict  attention  to  asymptotic  discussions. 

In  Section  2  we  will  introduce  the  idea  of  compromise 
estimators  and  discuss  their  optimality  properties.  Section 
3  contains  the  corresponding  asymptotic  theory.  As  an  exam¬ 
ple  we  define  a  compromise  estimator  which  is  asymptotically 
everywhere  at  least  as  good  as  Huber's  minimax  estimator. 

2.  Configural  Estimators 
Pitman's  Estimator 

Let  x^,  x2'***,xn  n  observations  in  an  i.i.d.  sam¬ 
pling  situation  from  F(x  -  n)  where  l-P(x)  •  7(-x)  for  all 
x.  We  also  assume  that  F(x)  +  0  or  1  for  any  finite  x  and 
furthermore  that  F()  has  density  f()  with  respect  to  Lebes- 
gue  measure. 

We  restrict  attention  to  symmetric  sampling  situations  in 


order  to  avoid  the  issue  of  what  "paraaeter"  we  try  to  esti¬ 
mate.  Symmetry  of  the  underlying  distribution  allows  us  to 
define  a  target,  naaely  u  ■  center  of  symmetry.  Furthermore 
we  will  not  get  into  any  discussions  if  later  on  we  allow 


for  two 


or  many  —  different  saapling  situations.  The 


center  of  symmetry  is  well  defined  for  all  symmetric  shapes 
which  means  the  estimation  of  n  is  a  well  defined  problem 
for  a  large  class  of  saapling  situations. 


The  solution  Pitaan  gives  is 


oo  n 

X  r  n  f(x,  ♦  r)  dr 

_  j _ *  a 


X  n  «<*,  ♦  r> 

-oo  i-1  1 


(see  Pitaan  (1938)).  This  estiaator  has  the  saallest  mean- 
square-error  aaong  all  location  equivariant  estimators. 
Location  equivariance  is  a  very  reasonable  restriction  on  a 
location  estimator  T(),  it  means* that 

T(x^  +  r, . . . ,xn  +  r)  •  T(x^, ... ,xn)  +  r,  r^R,  (2.2) 
i.e.  the  estiaator  changes  in  the  same  way  as  the  sample. 


remarks*  (1)  The  most  revealing  way  of  deriving  (2.1)  is 
through  the  concept  of  "configurations”.  By  this  notion  we 
aean  the  pattern  of  the  points  in  our  (ordered)  sample  and 
it  is  easily  seen  that  this  is  an  ancillary  statistic.  The 
Pitaan  estiaator  then  is  chosen  such  that  conditioned  on  the 
configuration  the  estimate  is  unbiased.  Since  the  condi¬ 
tional  variance  can  not  be  affected  by  the  choice  of  the 


estimate  (under  equivariance) ,  this  has  to  produce  the  smal- 
lest  mean-square-error. 

(2)  The  conditions  on  £()  such  that  (2.1)  exists  are  dis¬ 
cussed  in  Pitaan  (1938) . 

Foraula  (2.1)  produces  an  estimator  of  the  center  of 
symmetry  n  no  matter  what  the  underlying  sampling  situation. 
It  therefore  need  not  be  so  that  the  x^"s  are  sampled  from 
F(x-ji) . 

Let  us  therefore  introduce  G(x-ji)  —  again  G(x)  ■  1  -*  G(-x) 
for  all  x's  —  as  the  sampling  situation  for  x^  , ...,x_. 
This  is  a  new  way  of  looking  at  the  Pitaan  estimator  Tp  and 
it  of  course  immediately  lets  us  see  the  optimality  property 
in  a  new  light.  If  e.g.  F  ■  i  and  G  *  Cauchy  we  are  looking 
at  the  behavior  of  the  arithmetic  mean  under  Cauchy  sam¬ 
pling.  If  we  are  open  minded  about  the  assumptions  we  base 
our  inference  on,  we  have  to.  admit  that  in  small  samples  we 

a 

can  not  with  any  reasonable  precision  know  what  the  underly¬ 
ing  sampling  situation  is  nor  should  we  attempt  to  make 
inferences  about  it.  Huber  (1964)  formalizes  the  idea  of  a 
robust  method  as  a  procedure  which  "behaves  well"  in  the 
neighborhood  of  a  parametric  model.  Huber  therefore  would 
allow  G()  to  be  chosen  somewhere  near  F()  and  he  modifies  T? 
in  such  a  way  that  the  behavior  of  the  new  estimate  is 
acceptable  for  all  allowed  G()'s.  This  leads  us  away  from 
considering  estimates  like  Ty  which  are  optimized  at  a  sin¬ 
gle  "point”.  Since  —  in  small  samples  —  we  will  never  be 
able  to  tell  at  which  "point”  we  are,  it  ought  to  be  obvious 


that  single-point-optiaization  is  a  bad  strategy. 

2.2.  Compromise  Eatiaatora 

Let  us  now  consider  the  case  where  x^  ,  ...,xn  is  a  sam¬ 
ple  froa  either  F^(x-u)  or  F2(x-u)*  where  and  F2  satisfy 
all  the  constraints  of  F  (see  beginning  of  Section  2.1) .  We 
are  now  interested  in  location  equi variant  eatiaatora  which 
optimize  at  two  "points”,  naaely  F^  and  F2,  simultaneously. 
This  is  obviously  iapossible.  However,  decision  theory 
teaches  us  that  estiaates  of  the  fora 

TF1#  F2  ,c(  ^x1'  " 

,  n  n 

J  r  {<(  n  ft(x.  ♦  *)  ♦  (l-«)  n  f 2 +  r) }  dr 

- - gkl-L -  ,2.3, 

S  {<*  n  «i(*i  ♦  ♦  cx-co  n  «2{xi  *  *>}  dr 

i-l  1  1  i-1  2  1 

(0  <  c(  <1)  are  bi-optiaal  in  the  sense  that  they  cannot  be 

improved  in  both  sampling  situations  F^  and  F2  simultane¬ 
ously  (see  Ferguson  (1968)). 
remarks:  (1)  We  can  also  write 


Fx,  F2  ,c(  '*1 


(x. ,  ... ,x  ) 


^  Sy  (*1  , . . . ,Xn)  Tv  (X,  ,...,Xn)  + 


n' 


(1— ^)  Wy  (r^  , . . . ,xn)  Tv  (x,  ,...,xn) 


n1  F- '*1 


(2.4) 


where 


T  I!  *^(*1  *  r)  dr 


X  — X 


f  fa  n  Mx,  ♦  r)  +  (1-cC)  n  t7(xi  +  r)}  dr 
i-1  1  1  i-1  2  1 


(k  ■  1,  2)  and  T_  ()  is  defined  in  (2.1).  We  therefore  can 


interpret  the  family  of  bi-optimal  estimators  as  a  weighted 
mean  of  the  single-situation  optimal  estimators.  Note,  how¬ 
ever,  that  the  weights  are  "adaptive",  they  depend  on  the 
sample  values.  Of  course  any  equivariant  estimator  can  be 
represented  as  a  weighted  mean  of  the  single-situation 
optimal  estimators.  What  matters  here  is  the  simplicity  and 
form  of  the  weights  together  with  their  small  sample 
optimality  property. 


(2)  It  is  clear  from  (2.3)  that  T-  _  -  *_ 

*2r  y  ^ 


T„  and 
2 


TPlf  p2,  l  *  Tr1- 


(3)  The  picture  which  helps  ua  most  in  understanding  the 


compromise  estimators  is  shown  in  Pigure  2.1. 

Piqure  Plot  of  the  mean-aqare-errors  in  the  two  situa- 


Note  that  since  we  only  consider  location  equivariant  esti¬ 
mators  the  risk  in  any  given  situation  does  not  depend  on 


n.s.e.  in  F2 

Fi«jure  2.1 t  Plot  of  tha  aaan-squara-arror 
in  tha  two  situations 


the  parameter  value  u  (see  Ferguson  (1968) ) .  The  bi-optimal 
or  compromise  estimators  are  the  ones  which  lie  on  the  con¬ 
vex  boundary  curve. 

(4)  A  Bayesian  interpretation  of  the  estimator  (2.3)  is  pos¬ 
sible.  In  that  framework  (c(,  1  -  c()  is  a  prior  distribution 
on  the  set  of  underlying  sampling  shapes. 

(5)  In  order  to  implement  (2.3)  in  an  actual  application, 

the  formula  (2.4)  has  some  interesting  interpretations.  Pre- 
gibon  and  Tukey  (1980)  derive  the  formulas  from  the  point  of 
view  sampling.  This  leads  to  the  consideration  of  dif¬ 

ferent  weights  w_  and  w_  . 

*1  *2 

The  choice  of  the  two  compromising  distributions  and 

Fj  is  of  importance  in  actual  applications  of  the  technique. 
In  many  applications  it  is  traditional  to  consider  F^  -  &, 
the  Gaussian  shape.  The  choice  of  F2  is  somewhat  related  to 
the  choice  of  the  contamination  parameter  •«  in  Huber's 
model.  F2  will  influence  two  aspects  (see  (2.4)): 

(i)  the  "relative  weights"  w_  and  w_ 

rl  *2 

(ii)  the  "other"  optimal  estimator  T-  . 

r2 

These  two  aspects  have  an  interpretation  in  the  theory  of 
M-estimators.  The  first  is  connected  with  the  choice  of  tun¬ 
ing  constants,  like  k  in  Huber's  ^-function  (^(x)  •  max(- 
k,min(k ,x) ) ) ,  and  the  second  with  the  shape  of  the 
function.  From  small  sample  studies  we  know  for  example  that 
a  redescending  ^-function  is  advantageous  —  it  costs  little 


In  this  section  we  are  going  to  explore  what  happens  to 

compromise  estimators  (see  (2.3)  or  (2.4))  if  we  sample  from 

a  distribution  GO  and  let  the  sample  size  n  grow.  We  will 

see  that  the  weights  w_  and  w-  usually  tend  to  (0rl)  or 

F 1  F2 

(1,0),  respectively.  A  compromise  estimator  for  large  sample 
sizes  therefore  will  be  close  to  either  the  optimal  estimate 
under  or  the  optimal  estimate  under  This  is  a  reason¬ 
able  behavior  since  the  "information"  about  the  sampling 
situation  G()  grows  as  the  sample  size  gets  large.  The  dis¬ 
tinction  between  F^  and  Fj  is  therefore  more  and  more  estim¬ 
able.  In  a  few  words  then,  we  can  say  that  compromise  esti¬ 


mators  exhibit  an  adaptive  behavior  with  the  relative 


weights  w_  and  w_ 
F1  f2 


4 

(see  (2.4))  gauging  the  adaptation. 


3.1. 


Behavior  of  the  Relative  Weights 


Suppose  x^  , ...,xR  is  a  sample  of  size  n  from  G(x-u). 
We  assume  that  GO  is  symmetric  around  0.  The  relative 
weights  are  defined  as 


Wy^(Xi  ,...,xn)  ■ 
n 

£  n  £|,(x4  +  r)  dr 

_ i»l’K  1 _ 

n  n 

£  M  n  f1(xi  +  r)  +  ( 1— cO  n  f2(Xi  +  r)}  dr 
i-1  1  1  i-1  2  1 


(3.1) 


9 


I.*  . 

t 

t:-; 

•  * 

iy 

i  *  . 

b  « 

i;-- 

t- 


(k  *  1  or  2),  where  the  notation  is  the  same  as  in  (2.4). 

The  following  Lemma  treats  an  "overly  nice"  case. 

Lemma  2*1 

Let  us  assume  that  both  -log  f^(x)  and  -log  f2(x)  ere  convex. 

And  let  us  furthermore 
assume  that  GO  is  such  that  the  functions 

A^r)  -  S  log  fL  (x  +  r)  dG(x) 
and 

A2 (r)  -  X  log  f2  (x  +  r)  dG (x) 
exist  for  all  r  and  achieve  a  unique  maximum  at  r-0.  If 


X  log  fx(x)  dG(x)  >  S  log  f2(x)  dG(x) 
it  follows  that 


(3.2) 


WP2^X1' *  *  * '*n* 
%,F1(xl'***'xn) 


— ->  0  a.s. 


proof:  Let  X^,  X2#...  denote  a  sequence  of  iid  random  vari¬ 
ables  with  common  distribution  G(x).  From  (3.1)  we  have 

n 

w-  (X.,...,Xn)  f  II  f,(X4  +  r)  dr 

*2  1  n  i«i  *  x 


fp  (X^» ...  fX^) 


S  n  f,  (X.  +  r)  dr 
i-1  1  1 


Now 


(X,,...,x_)  -  S  n  f (X,  +  r 
1  n.  i-1  1 


)  dr 


l  n 

S  exp(n(„  2  log  f(X.  +  r) ) )  dr 
n  i-1  1 


■  X  exp(n  An(r) )  dr, 

where 

l  n 

A  (r)  -iS  log(f (X,  +  r)) 
n  n  i-1  1 

and  £  stands  for  either  f^  or  fj. 

Since  we  are  interested  in  the  large  sample  behavior  of 
I(X^,...,Xn)  we  can  use  an  asymptotic  expansion  argument  to 
approximate  I ( ) . 

We  know  that  for  large  n 

I(Xir...,Xn)  “  f  exp(n{An(Rj)  “  |  A~n(Rj})  (r  -*Rj)2})  dr 

1 

*  exp(nAn(Rj))  (2j)T - i - j  (3.3) 

(A"n<BS,,T 

where  Rq  denotes  the  point  where  the  (random)  function  An() 
takes  its  maximal  value.  Such  a  single  maximum  exists 
because  of  our  convexity  assumptions. 

a 

The  theory  of  asymptotic  expansions  is  treated  for  example 
in  Chapter  6  of  Dingle  (1973) . 

If  we  blend  the  probability  structure  which  underlies  the 
sequence  X^,  Xj,...  (due  to  iid  sampling  from  GO)  with  the 
asymptotic  approximation  (3.3),  we  can  say  something  about 
the  behavior  of  the  right  hand  side  in  (3.3) .  Because  of  the 
strong  law  of  large  numbers  and  our  assumptions  we  have 

An(r)  — >  A(r)  a.s.  fpr  all  r  (3.4) 

and  from  this  we  can  conclude  that 


so  that 


J  log  I(X1,...rXn)  — >  A(0)  ■  X  log  f(x)  dG(x)  a.s. 

where  £()  denotes  either  £^()  or  f2()  (see  (3.3)). 

We  therefore  conclude  from 


S'  log  fL(x)  dG (x)  >  log  f2(x)  dG(x) 


that 


5  1o«  V*! . V  -  n  1o*  h'*! . V 


<4>  i°*  <  2 


Wp  (X^f ...  ,XJ 


**  (X^#...»X^) 


1  — >  constant  >0  a.s. 


P1 

where  X1(Xlf...fX||)  ,  Xa(X1, . . .  ,Xft)  refer  to  f()  -  fx(),  f() 

■  f 2 ( ) ,  respectively. 

Proa  this  last  statement  the  theorem  follows  immediately. 


Lemma  3.1  is  not  good  enough  for  our  purposes  since  the 
assumption  about  A^(r)  is  nojt  always  satisfied.  If  for 
example  f^  ■  fi  and  G()  has  an  infinite  second  moment  then 
A^(r)  *  -oo  for  all  r.  But  we  would  of  course  still  expect 
that  Lemma  3.1  holds ,  in  fact  we  would  hope  for  'very  fast 
convergence  of  the  ratio  of  relative  weights. 


Lemma  3.2 


Let  us  assume  that  both  -log  ff^(x)  and  -log  f2(x)  are  convex 

And  let  us  furthermore 
assume  that  G()  is  such  that  the  function 


< 


-  12  - 


A2(r)  -  X  log  f2  (x  +  r)  dG(x) 
exist  for  all  r  and  achieves  a  unique  maxinun  at  r»0. 
If  $  log  f 1  (x)  dG(x)  *  -oo  then  it  follows  that 


wP2(xl'***'xn) 

"f^*! . V 


- >  0  a.s. 


proof:  Use  the  same  argument  as  in  the  proof  of  Lemma  3.1 
and  note  that 


log  <  any  constant  a.s. 

remarks:  (1)  The  asymptotic  expansion  (3.3)  shows  how 

closely  the  maximum  likelihood  estimator  is  connected  to  the 

Pitman  estimator.  Note  that  A-'tO)  is  equal  to  the  Fisher 

information  if  G()  *70#  i.e.  the  sampling  situation  and 

the  modelling  situation  agree.  We  will  see  below  that  the 

maximum  likelihood  estimator  T_  is  indeed  asymptotically 

*1 

identical  to  the  Pitman  estimator. 

(2)  It  is  reasonable  to  believe  .that  Lemma  3.1  holds  in 
greater  generality.  The  convexity  conditions  on  the  log  den¬ 
sities  are  probably  not  needed. 

Corollary  3^i  Under  the  assumptions  of  the  Lemma  3.1  or 
Lemma  3.2  and  if 

X  log  f x (x)  dG(x)  >  X  log  f2(x>  dG(x) 
then  the  compromise  estimator  T-  _  M  (a(/0)  is  asymptoti- 

f  *2*  ° 

cally  equivalent  to  the  Pitman  estimator  T-  . 

F1 


roof:  Apply  the  Lemmas  to  Formula  (2.4) 


remarks:  (1)  Corollary  3.1  states  that  as  the  sample  size 


increases  the  compromise  estimator  will  pick  either  one  of 
the  two  single- situation-optimal  estimates  depending  on 

(3.2) . 

We  therefore  expect  that 

S'  log  fx(x)  dG(x)  -  $  log  fx(x)  dG(x) 

ft  (x) 

-  S  iog  <*G(x)  (3.5) 

is  a  quantity  which  decides  whether  the  sampling  situation  G 
is  "closer”  to  the  modelling  situation  F^  or  the  modelling 
situation  Fj. 

The  quantity  (3.5)  is  closely  related  to  the  Kullback- 
Leibler  mean  information  for  discrimination  (Fullback  and 
Leibler  (1951)).  Their  formula  is 

*  fl(x) 

I  (ls2)  -  S  log (  fx(x)  dx, 

where  1(1:2)  is  the  mean  information  for  discrimination  per 
observation  from  sampling  situation  F^. 

(2)  The  asymptotic  behavior  of  the  compromise  estimators 

(2.3)  does  not  depend  on  c(  (unless  (3.5)  »  0). 

(3)  More  results  about  Pitman  estimators  can  be  found  in 
Johns  (1979)  and  Klaassen(1981) .  George  Easton  has  proved 
the  results  given  in  Section  3.1  for  the  more  general  case 
of  unknown  scale  (Easton  (1984) ) . 

3.2.  Asymptotics  of  the  Pitman  Estimator 


B 


P 


P 


i 

L* 


i 
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In  order  to  got  asymptotic  afficianeias  for  the 
compromise  estimators  we  need  to  know  more  about  the  asymp¬ 
totic  behavior  of  the  Pitman  estimators  T.  and  T-  .  Port 

rl  r2 

and  Stone  (1974)  provide  the  information  in  the  case  where 
the  sampling  situation  and  the  modelling  situation  are 
identical.  In  our  more  general  setup  we  can  argue  the  fol¬ 
lowing  way: 

f  r  exp(nA  (r) )  dr 
Vxl'“*'xn>  “  “  yTxpTn"Anrri )  dr  ' 

where 


An(r)  -  j  5  log  f  (*i  ♦  r)  .(fO-gjPO). 
If  we  expand  the  numerator  asymptotically  we  get 


Tp(X^f ... fX^) 


exp(n  An(rJ))  J"  r  exp{-  5*‘"'n(rS) 

- J-e^<n-fcn(firar - 


rJ)J) 


where  r|}  maximises  AR(r).  We  therefore  showed  that  asymptot¬ 
ically  the  Pitman  estimator  and  the  maximum  likelihood  esti¬ 
mator  (-  rg)  agree.  This  agreement  is  good  enough  to  con¬ 
clude  that  the  asymptotic  distributions  are  the  same.  Huber 
(1967)  then  provides  the  necessary  results. 


3.3.  Huberts  Contamination  Model:  An  Example 

To  illustrate  the  use  of  the  theory  we  developed  let  us 


Sv 

•VO 
’  \‘  1 


i-  V.  ■ 


look  at  tha  compromise  astiaators  baa ad  on  tha  two  modelling 
dans it las 


fx(x)  *  d(x)  •  — exp(-^x2) 

(2a)7 

f2(x)  •  (l-«)  d(x)  if  | x |  <  k 

2 

•  exp  (4j-  -  k|x|)  otherwise 

(2a)7 

(where  k  is  such  that  -  2£(-k)  -  j— r-j)  • 

Tha  alternative  density  is  of  course  tha  least  favorable 
choice  in  tha  class  of  distributions 

{(1  -  «)•()  +  «H()  :  H() symmetric}  (sea  Buber  (1964)). 

Tha  asyaptotic  variance  of  an  estimator  compromising 
between  these  two  syaaetric  situations  (see  (2.3))  will  be 
equal  to  either  of  the  asymptotic  variances  of  the  Pitman 

a 

estimators 


T-  ■  arithmetic  mean 
F1 


or 


T-  ■  Pitman  estimator  for  the  least 
*2 

favorable  distribution. 

If  we  sample  from  distribution  G()  we  have  for  these  asyap¬ 
totic  variances  (uQ  ■  $  x  dG(x)) 

as.  varG(Tp^)  -  f  (x  -  uQ)2  dG(x) 


as.  vars(Ty  )  - 


X  (^(x  -  us))2  dG(x) 

2  CT  1*\(x  *  **g)  dG<*>>2 


%  % 


where  **k(x)  •  -  £" 2(x)/f2(x)  m  max(-k,  min(k.x)). 


In  his  1964  paper  Huber  shows  that  tha  M-estimator 
based  on  is  asyaptotically  ainiaax  for  sampling  situa¬ 


tions  chosen  from  the  contamination  class.  Since  T.  has  the 

*2 


same  asymptotic  behavior  as  this  M-estimator,  the  same  claim 


can  be  made  for  T_  .  Mote,  however,  that  for  finite  sample 

*2 


sizes  T. 


F  will  be  superior.  The  following  Proposition 
2 


explains  the  asymptotic  behavior  of  the  compromise  estima¬ 
tors  (see  (2.3)). 


Proposition  3.1  Let  G(x)  ■  (1— «)$(x)  +  -«H(x)  where  H(x)  + 
H(-x)  »1  for  all  x's  and  H()  puts  all  its  mass  outside  the 

interval  [-k,k],  but  is  otherwise  arbitrary.  Furthermore 
assume  that  0  <  <  £  0.5.  Then 

as.  varr (compromise  estimator)  <  as.  varr (Huber's  minimax 

estimator) 

proof:  From  Lemma  3.1  and  Lemma  3.2  we  know  that 


S  log  fx(x)  dG  (x)  -  J“  log  f2(x)  dG(x) 


f  {log  — ^  -  Jx2}  dG (x)  -  f  log  dG(x) 


(2s) 


(2s) 


-  X  4x2  dG(x)  -  2  £  <V  *  kM>  dG(x) 


oo  k2 


2  2 

-  -log (l-«)  +  2  ^  {k |x(  “  V  “  dG(x)  (3.6) 

is  the  quantity  which  decides  about  the  asymptotic  variance 
of  the  compromise  estimator.  Note  that  we  made  use  of  the 


1  ^  —  • 


\  N 


symmetry  of  the  sampling  distribution  6  in  ths  derivation  of 
(3.6) .  If  (3.6)  is  positive  the  compromise  estimators  will 
behave  asymptotically  like  the  arithmetic  mean*  otherwise 
like  the  Huber-estimator.  All  that  remains  to  be  considered 
therefore  is  the  case  where  (*.6)  is  positive  (or  zero) 
because  in  the  other  case  the  assertion  of  the  Proposition 
is  trivial. 

First  note  that  (3.6)  can  only  be  positive  if  G  has 
finite  variance.  Using  our  assumptions  about  G  * 
(1— <) $  +  *«h  stated  in  the  Proposition,  (3.6)  can  be  written 
as 

-log (l-«)  +  2(1-0  *£  (k|x|  -1^-2^)  *(x)  dx  + 

*  S  (M*l  *  *  ^y*)  dH(x) 

-  -log  (1—4)  -  (l-«)  £  (x-k)  2  (x)  -  J  f  (x-k)2  dH(x) 

-  -log (1—4)  -  (1-4)  { kjrf (k )  +  *(-k)(l+k2)  -  Jk2  -  5°H 

(3.7) 

where  x2  dH(x)  is  the  variance  of  the  contaminating 

distribution. 

A  comparison  of  the  asymptotic  variances  of  the  sample 
mean  and  Huber's  estimator  is  not  hard.  We  have 

as.  varQ (sample  mean)  ■  (1—4)  +  <4  ojj  (3.8) 

X  (*.(x))2  dG (x) 
as.  var- (Huber-estimator)  ■  x 

tf/'k(x)  dG(x) ) 2 


f  x‘  dG(x)  +  2  S  k*  dG(x) 


dG  (x) ) 


>  Cl--*)  (l-2#(-k))  ♦ 

>  (l-«) (l-2*(-k))  +  log (jij) 2  >  (3.8) 
if  only  we  show  that 

>  log (^J  2 


(3.10) 


holds.  This  last  inequality  is  only  true  for  *  small  enough, 
e.g.  «  _<  0.5.  ?or  such  *  values  we  have 

log  ( 2  <  3-i  (0  £  *  <,  0.5) 

and  (3.10)  is  therefore  proved  if  we  show  that 


>  3. 

<->  ■  2^  -  2*(-M 

<—>  2k# (-k)  >  §rf(k) 

<— >3k#(-k)  >  0(k)  for  all  .436  <  k  <  oo. 


(3.11) 


Note  that  the  range  of  <  values  from  0  to  0.5  translates 
into  a  range  of  values  for  k.  * 

This  last  inequality  (3.11)  which  is  equivalent  to 
(3.10)  does  indeed  hold  and  is  left  for  the  reader  to  check. 

Proposition  3.1  is  now  proved  for  all  the  cases  where 
(3.6)  is  strictly  positive.  Some  care  is  needed  if  (3.6)  is 
zero.  Then  the  compromise  estimator  is  asymptotically  a  con¬ 
vex  linear  combination  of  Tp  and  Tp  ,  but  since  the  asymp¬ 
totic  variance  of  Ty^  is  lower  than  the  asymptotic  variance 
of  Tp^  the  compromise  estimator  will  have  an  asymptotic 


variance  below  the  asymptotic  variance  of  T-  . 

r2 

remarks:  (1)  We  have  identified  a  class  of  sampling  situa¬ 
tions  G,  namely  those  where  (3.6)  is  positive,  for  which  the 
mean  is  more  efficient  estimator  than  Huber's  minimax  esti¬ 
mator.  It  would  be  of  interest  to  show  how  big  this  class  is 
and  also  to  check  whether  it  contains  all  sampling  situa¬ 
tions  for  which  the  sample  mean  is  asymptotically  better 
than  Huber's  minimax  estimator. 

4.  Discussion 

This  paper  deals  with  estimators  which  compromise 
between  different  "shapes".  This  idea,  as  we  have  seen,  pro¬ 
duces  "robust”  estimators.  If  we  compromise  between  the 
Gaussian  and  Huber's  least  favorable  distribution  we  have  a 
family  of  estimators  (for  different  values  of  c()  which  dom¬ 
inate  Huber's  minimax  M-estimator  asymptotically. 

* 

Several  points  need  to  be  clarified,  however.  The  idea  of 
compromising  is  different  from  the  usual  asymptotic  robust¬ 
ness  theory  as  developed  by  Huber  (see  Buber  (1964)  and 
Huber  (1981)).  There,  the  compromising  takes  place  in  a 
neighborhood  of  the  "central"  model,  whereas  in  our  approach 
the  different  shapes  need  not  be  close  together.  A  neigh¬ 
borhood  model  is  in  fact  only  a  first  step  towards 
robust/resistant  techniques  for  small  sample  sizes.  For  sam¬ 
ples  of  size  S  we  would  advise  to  compromise  between  the 
Gaussian  and  something  like  the  slash  (■  distribution  of  a 
ratio  of  a  Gaussian  over  an  independent  uniform)  rather  than 


using  ths  only  moderately  tailed  least  favorable  distribu¬ 
tion. 

The  intention  of  this  paper  is  not  to  show  that  we  should 
use  a  compromise  between  the  Gaussian  and  the  least  favor¬ 
able  distribution,  but  rather  to  let  people  know  of  the  mer¬ 
its  of  compromise  estimators  in  a  language  which  many  sta¬ 
tisticians  are  used  to,  namely  asymptotics. 

Results  found  through  small  sample  experiments  are  of 
greater  importance.  It  is  clear  for  example  that  the  situa¬ 
tions  (or  shapes)  we  compromise  ought  to  change  with  the 
sample  size.  The  amount  of  "information”  in  the  sample  grows 
with  the  sample  size.  Mot  only  are  we  able  to  estimate 
"parameters"  with  less  variability,  we  also  gain  insight 
into  the  underlying  shape.  Compromise  estimators  use  this 
knowledge  in  an  optimal  way  and  with  our  choice  of  the 

shapes  we  can  fine-tune  the  procedure.  Important  choices 
have  to  be  made  in  that  respect  and  more  (probably  experi¬ 
mental)  research  for  small  sample  sizes  is  needed.  Subject 
matter  knowledge  might  prove  useful  in  this  connection. 

The  extension  of  Pitman's  ideas  to  more  than  one  shape 
provides  us  with  a  tool  to  find  meaningful  small  sample 
methods  of  the  robust/resistant  kind.  In  order  to  make  the 
asymptotics  simple  we  did  not  deal  with  the  scale  parameter. 
In  actual  applications  the  inclusion  of  this  additional 
parameter  is,  however,  no  problem  (see  Bell  and  Morgenthaler 
(1981)  for  an  example) . 
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